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Abstract 

The most widely used algorithm for floating point complex division, known as Smith's 
method, may fail more often than expected. This document presents two improved 
complex division algorithms. We present a proof of the robustness of the first improved 
algorithm. Numerical simulations show that this algorithm performs well in practice 
and is significantly more robust than other known implementations. By combining 
additionnal scaling methods with this first algorithm, we were able to create a second 
algorithm, which rarely fails. 
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1 Introduction 

In floating point arithmetic, designing a robust complex division algorithm is 
surprisingly diflicult. We mainly focus in this article on double precision IEEE 
754-2008 binary 64 floating point numbers, since these are the floating point 
numbers used in Scilab. 

Indeed, it is easy to present numerical experiments where a naive, straight- 
forward implementation fails to produce an accurate answer. This type of failure 
is caused by the -I- d? term which appears in the denominator of the complex 
division expression. This intermediate term can overflow, which is the main 
reason why the naive method fails to work in practice for a large range of input 
values. 

Smith's 1962 method |17) is based on an algorithm which greatly improves 
the robustness of complex division in floating point arithmetic. This algorithm 
is used in Scilab, and in many other numerical computing softwares as well. 
Actually, this algorithm is not as robust as we might think. It is easy to flnd 
particular complex divisions where Smith's method fails. 

In these cases, as we are going to see soon, the failure is complete in the 
sense that none of the digits in the result are correct. In the current paper, 
we "only" try to get the correct magnitude in the results, but this is a difficult 
challenge in the case of the complex division. 

Many authors have recognized this fact, but we found that few authors 
actually provide better algorithms. Stfll, in 1985, Stewart [111I19] provided an 
algorithm which improves the robustness of Smith's algorithm, by considering 
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all the ways of computing a product oi • 02 • 03 which appears in an intermediate 
expression of Smith's complex division. Stewart proved that, if oi, 02 and 03 are 
three floating point numbers such that the product oi ■ 02 ■ 03 is a floating point 
number, then there is a way to compute the product which avoids underflows. 
There are exactly three ways of computing this product, namely (oi • 02) • 
03, ai ■ {a2 ■ fla) and 02 ■ (ai • 03). Stewart's idea was to use the appropriate 
expression, by multiplying the number of largest magnitude with the one with 
smaller magnitude. 

Stewart's algorithm works in more cases than Smith's. But the formulation 
of the algorithm is significantly more complicated. This increased complexity 
may have discouraged developers to use this improved algorithm. In practice, 
Smith's method is still the most widely used algorithm. Moreover, we guess that 
most developers did not feel the need for improving their algorithm, probably by 
the lack of evidence that their complex division algorithm fails in situations of 
practical importance. The randomized numerical experiments that we present 
in this paper may make clearer the fact that a complex division may fail more 
often than expected. 

In this article, we present an improved Smith's algorithm, which performs 
significantly better than the original algorithm and is significantly simpler than 
Stewart's algorithm. More precisely, our algorithm does not require to compute 
the product oi • 02 • 03 in three different ways. In fact, we prove that one of the 
three ways is useless, which simplifies the algorithm. 

In order to perform the numerical comparisons, we created a Scilab module 
which implements all the algorithms presented in this paper. This module is 
available at 

http : //f orge . s c ilab . org/ index . php/ p/ compdi v 

The functions are provided under the CeCiLL, an open-source license which is 
GPL-compatible. The technical report [I] is provided within the project, under 
the Creative Commons licence. 

We compared our method with other algorithms and found that most al- 
gorithms which claimed for improved accuracy or improved performance were 
in fact significantly less accurate than expected. More precisely, our numerical 
experiments suggest that the rate of failure of our improved algorithm is as low 
as Stewart's, and might be 4 orders of magnitude lower than a naive method, 
and 2 orders of magnitude lower than the other known implementations. 

This improved algorithm can still fail in a significant number of situations. 
We used extensive randomized numerical experiments to create a collection of 
difficult complex divisions. Indeed, using randomly generated complex divisions 
overcomes the problem of trying all possible combinations of arguments. Indeed, 
this is not possible with four arguments, each being a double, because this would 
lead to a too large number of experiments. 

This work continues our analysis of the algorithms that the first author 
began in |6]. In this document, we presented an example of a failure of Smith's 
algorithm on an example inspired by Stewart (this example is presented in the 
section [275]) . Some months after the public release of this document on the web. 
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the second author [T7] suggested to the first author to use a modified formula 
which is at the core of the algorithm presented in the section [3l 

This paper is organized as follows. In the first section, we present the naive 
approach and present particular examples of failure of this method. Then we 
present Smith's 1962 method and analyze failure cases of this method. We 
present the lack of reproducibility of some particular complex divisions, caused 
by the use of extended precision on some processors. We present a review of 
other known complex division algorithms and analyze their main features. In 
the second section, we present our improved algorithm and a proof which states 
that the implementation is, in some sense, minimum. We present numerical 
results and compare our improved implementation with five other algorithms. 
Then we analyze this improved algorithm and present a second, much more 
robust algorithm. 

2 Naive and non-naive algorithms 

In this section, we present the path from a naive implementation of the complex 
division to the more robust implementation suggested by Smith in 1962. 

2.1 The binary 64 doubles 

Before beginning our journey within the complex division algorithms, we present 
in this section the parameters of the floating point numbers which are used in 
Scilab, that is, we present binary 64 doubles. 

In Scilab, we use double precision floating point numbers, which corresponds 
to the binary 64 floating point numbers of the IEEE-754-2008 standard. In 
this floating point system, the significand makes use of p = 53 binary digits of 
precision and 11 binary digits in the exponent. The parameters of the IEEE 
754 double precision floating point system are presented in the figure [TJ 

The benefits of the IEEE 754 2008 standard [8] is that it increases the porta- 
bility of programs which are making use of floating point numbers, and especially 
those written in the Scilab language. Indeed, on all machines where Scilab is 
available, the radix, the precision and the number of bits in the exponent are 
the same. As a consequence, the machine epsilon e is always equal to the same 
value for doubles. This is the same for the largest positive normal and the 
smallest positive normal which are always equal to the values presented in the 
figure [TJ The situation for the smallest positive subnormal a is more complex, 
but, in this paper, we assume that gradual underflow is available. 

Doubles have intrinsic limitations which can lead to rounding, overflow or 
underflow. Indeed, not all mathematical numbers x can be represented as float- 
ing point numbers. We denote by fl{x) the floating point representation of x. 
In this paper, we are mostly concerned with overflow and underflow. Indeed, 
if x is a number such that \x\ < a/2, then the floating point representation 
of X is fl{x) — 0, i.e. x underflows. Notice that, if a/2 < \x\ < a, then the 
double which is closest to |a;| is a, which implies that x is represented by ±a. 
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On the other hand, if a; is a number such that > 17, then the floating point 
representation of x is fl{x) = Inf , i.e. x overflows. 

2.2 The naive method 

Consider the two complex numbers a-\-ih and c -I- id, where a,b,c and d are four 
real numbers and i is the imaginary number which satisfies = —1. We would 
like to divide these two complex number, that is, we would like to compute 
2-t|^ = e + if, where e and / are real numbers. We all know the equation 

a + ib ac + bd be — ad 
= |- i 

c + id + d"^ + d'^' 

where, by hypothesis, + d^ is nonzero. 

The Scilab function compdiv_naive presented in the figure [5] is a straight- 
forward implementation of the previous formulas. It takes as input the complex 
numbers x — a + ib and y = c + id, and returns z = e + if . 

The complex function creates a complex number from its real and imaginary 
parts. In IEEE arithmetic, all invalid operations return a Nan, meaning Not-A- 
Number. In the function compdiv_naive, we call the complex function to avoid 
to produce Nan numbers when this is possible. In the Scilab language, the 
variable 7,nan contains a quiet Nan, which is designed to propagate through all 
operations without signaling an exception. There are several simple operations 
which produce quiet Nans, including 0/0 and 7,inf-7,inf .. 

We avoid using the expression r=x+7,i*y and use the complex function in- 
stead. Indeed, depending on the input arguments, this might produce expres- 
sions such as 7Dinf +7Di*7.inf . This involves the expression 7»i*7»inf , which is 
executed as the multiplication of 0+7oi and 7oinf+7oi*0. The interpreter then 
distributes the product, so that the real part is 0*7inf-l*0, which generates a 
7onan. 

While the equation [T] is correct in exact arithmetic, it may fail when we 
consider fioating point numbers. Hence, a naive implementation based on the 
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Fig. 2: Naive algorithm for the complex division. 

function z = compdi v_nal ve (x,y ) 

a = real(x); b = imag(x); 

c = real(y); d = imag(y); 

den =c*c+d*d 

e=(a*c + b*d)/ den 

f=(b*c-a*d)/ den 

z = complex (e,f) 
endf unction 



previous formula can easily fail to produce an accurate result, as we are going 
to see in the next section. 



2.3 Failure of the naive method 

Consider the complex division 

(1 + + «2i"23) « 2-i°23 _ ,2-"'2:\ (2) 

which is accurate to more than 17 decimal digits and is exact with doubles. 
It is easy to check that the naive algorithm produces zero (for both the real 
an imaginary part) instead of the correct result: there is no significant digit 
in the result. Notice that both the input numbers a = l,6 = l,c=l and 
d = 2i°23 a,nd the outputs e = 2-i°23 a,nd / = -2-i023 ^re representable 
as double precision floating point numbers. We now focus on the floating point 
representation of the intermediate expressions. The floating point representation 
of den is fl{den) = fl{2^°'^^) = Inf, because 2^046 jg larger than O. Then, the 
e and / variables are computed as 

//(e) = //((I X 1 + 1 X 2i023)/7„/) = 0, 
fl{f) = fl{{lxl-lx2'''^^)/Inf) = 0. 

(3) 

Hence, the result is computed without any significant digit, even though both 
the input and the output numbers are all representable as double precision 
floating point numbers. 

The second test involves small numbers in the denominator of the complex 
fraction. Consider the complex division 

(1 + i)/(2-i023 + i2-i023) = 21023. (4) 

It is easy to check that the naive algorithm produces Inf + Nan i instead of 

the correct result. To get this result, we must first use the statement ieee(2), 
which configures Scilab so that Inf and Nan numbers are generated instead of 
Scilab error messages when divisions by zero are performed. 
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Fig. 3: Smith's algorithm for the complex division. 

function z = coinpdiv_sinith ( x , y ) 
a = real(x); b = imag(x); 
c = real(y); d = imag(y); 
if ( abs(d) <= abs(c) ) then 
r = d/c 

den = c + d * r 
e=(a+b*r)/ den 
f = (b - a * r) / den 

else 

r = c/d 

den = c * r + d 

e = (a * r + b) / den 

f = (b * r - a) / den 

end 

z = complex(e,f) 
endf unction 



In the previous case, we have a=l,6=l,c = 2 ^"^"^ and d = 2 This 
leads to 

fl{den) = /?(2-2046 + 2-2"46) ^ 0, 

//(e) = //((I X 2-i"23_^i X 2-i023)/o) = /n/, 

//(/) = //((I X 2-l"23 _ I X 2-l°23)/0) 

= f 1(0/0) = Nan. 

The two previous complex divisions can be computed accurately if we use 
Smith's algorithm, which is presented below. 

2.4 Smith's method (1962) 

In this section, we analyze Smith's method, which produces a more accurate 
division of two complex numbers. 

In Scilab, the algorithm for the complex division is done by the wwdiv rou- 
tine, which implements Smith's method [17) in the Fortran language. Smith's 
algorithm is based on normalization, which calculates the complex division even 
if the input terms are large or small. 

The starting point of the method is the mathematical definition [T] We 
have seen that the term -I- may generate overflows or underflows. This 
is caused by intermediate expressions whose magnitudes may be larger than 
necessary. The previous numerical experiments suggest that, provided that we 
had simplifled the calculation, the intermediate expressions would not have been 
unnecessary large. Hence, Smith's idea consists in rewriting the expressions so 
that we avoid writing the term + d"^. More precisely, if the magnitude of c is 
greater than the magnitude of d, then the original numerator and denominator 
are multiplied by 1 — i{d/c) instead of the usual c — id. 
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The compdiv_smith function presented in figure [3] implements Smith's method 
in the Scilab language. It is easy to check that Smith's method performs very 
well for the relatively difficult complex divisions that we met earlier. 

2.5 Robustness of Smith's algorithm 

In this section, we present an example where Smith's method does not perform 
as expected. 

The following example is inspired by an example by Stewart [18 . While 
Stewart gives an example based on a machine with an exponent range ±99, we 
consider an example which is based on Scilab's doubles. 

Consider the complex division 

(21023 + i2-1023)/(2677 _^ ^2^677) ^ 23^6 _ ^2~^°"» , (5) 

which is accurate to more than 17 decimal digits and is exact with doubles. The 
following Scilab script compares the naive implementation. Smith's method and 
Scilab's division operator. The session is performed with Scilab v5.2.0 under a 
32 bits Windows using a Intel Xeon processor. 

-->x = 2-1023 + '/.i * 2--1023; 
-->y = 2-677 + y.i * 2--677; 
-->z = 2-346 - y.i * 2--1008; 
-->z = coinpdiv_naive ( x , y ) 
z = 
Nan 

-->z = compdiv_sinith ( x , y ) 
z = 

1 . 43D+104 
— >x/y 
ans = 

1.43D+104 - 3.64D-304i 

In the previous case, the naive implementation does not produce any correct 
digit, as expected. Smith's method, produces a correct real part, but an inaccu- 
rate imaginary part. Once again, Scilab's division operator provides the correct 
answer. 

We check that Smith's algorithm is not accurate in this case. We have 
a = 2i°23^ = 2-1023^ ^ ^ 2^"^^ and d = 2"^'^^ and the algorithm performs the 
following steps. 

//(r) = /Z(2-6^72677) ^ /?(2-i354) = 

fl{den) = /?(267^ + 2-677 X 0) = 2^7^ 
fl{e) = /?((2i023 + 2-i"23 X 0)/2f577) 

= /^(2l023/2677) ^ j;(2346) 

sa 1.43 X 10^°'' 
fl{f) = /Z((2-i023 _ 21023 X 0)/267^) 
= //(2-1023/2677) ^ /;(2-i700) 

= 
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We see that the variable r underflows, so that it is represented by zero. This 
simplifies the denominator den, but this variable is still correctly computed, 
because it is dominated by the term c = 2^^^. The real part e is still accu- 
rate, because, once again, the computation is dominated by the term a. The 
imaginary part / is wrong, because this term should be dominated by the term 
ar. Since r underflows, it is represented by zero, which completely changes the 
result of the expression b-a*r, which is now equal to 6 = 2-i"23^ Therefore, the 
result is equal to 2^i'^^'^/2^^^, which underflows to zero. 

Numerical experiments indicate that the most common cause of failure of 
Smith's algorithm is produced by the underflow of products such as a*r or b*r. 
This is a direct consequence of the inequality \r\ < 1, which implies that r the 
probability that underflows is high. But not all underflows of r are producing 
large relative errors. Assume, for example, that \d\ < \c\. It can be proved [1] 
that the underflow of the produce d*r in the evaluation of den does not produce, 
by itself, a large relative error. On the other hand, it can be proved that the 
absolute error of e and / can be large if both a*r, b*r and d*r underflow. This 
problem is the motivation for an improved algorithm which is presented in the 
section [S] 

Since Scilab makes use of Smith's formula, it should fail in this case. But 
we have seen that Scilab is still able to perform accurately. This unexpected 
accuracy is analyzed in the next section. 

2.6 Extended precision 

In this section, we analyze the cause of the good behavior of Scilab's division 
operator in the previous case. 

The figure [4] presents the results of the previous complex division on sev- 
eral numerical computation softwares, including Scilab and Octave. Here, the 
exact result is Exact= 1.433+104-3.646-3041, while the approximate result is 
Approx.=l .433+104. 

The cause for the differences in the results is the use of extended precision 
by some processors, depending on the compilation options and depending on 
the platform [T31[TS]. Indeed, processors of the IA32 architecture (Intel 386, 
486, Pentium and compatibles) have a "x87" floating point unit, which has 
80-bit registers. This "double extended" format uses a 64-bit mantissa and 
a 15-bit exponent. In general, when the compiler generates the code for the 
IA32 platform, temporary variables are stored in the x87 registers. The range, 
from 2-16382 20-4932 21^383 ^ 104931^ of the double extended format is 
much larger than the range of the binary 64 doubles and, therefore, protects the 
algorithms against potential underflows or overflows. 

Still, it is possible to perform computations based on strict doubles if we use 
the SSE2 extensions. These extensions introduced one 128-bit packed floating- 
point data type, which consists of two IEEE binary 64 floating-point numbers. 

Depending on the compilers options used to generate the binary, the com- 
puter may use either the SSE unit (with binary 64 floats), or the x87 unit (with 
80-bits registers). On Linux, Scilab is compiled with the Gnu compiler, gcc [7]. 
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Fig. 4: Result of (2l023 + , . 2-1023)/(2677 + i . 2-677). 
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The default value of the -mfpmath option for 1386 machines is to use the x87 
floating point co-processor. On the other hand, on x86_64 machines, the default 
uses the SSE instruction set. On Windows, Scilab 5.2.0 is compiled with Visual 
Studio and the "/arch:IA32" option. This makes Scilab run on older Pentium 
computers that do not support SSE2 [2]- here, Scilab may use the x87 unit. On 
the other hand, Scilab 5.2.0 uses the SSE2 unit on Windows 64 bits systems. 
Scilab is not compiled with the "/arch:IA32" option on Windows since 2010 [3]. 

The issue here is that not all systems provide extended precision, which is an 
optional part of the IEEE-754 standard. Moreover, the programmer may have 
troubles with programs which produce a different result in Debug or in Release 
(i.e. optimized) mode. Hence, it is highly desirable to use an algorithm which 
provides accurate results by using only standard doubles. This is why several 
authors suggested more robust complex division algorithms, as we are going to 
see in the next section. 

2.7 A review of algorithms 

In this section, we present an overview of four different algorithms for complex 
division. This analysis reveals, among other things, that many algorithms are 
using pre and post scalings to avoid unnecessary overflows or underflows. 

The limits of Smith's method have been analyzed by Stewart in [18]. The 
paper separates the relative error of the complex numbers and the relative error 
made on real and imaginary parts. Stewart's algorithm is based on a theorem 
which states that if xi,X2, ■ ■ ■ ,Xn are n floating point representable numbers, 
and if their product is also a representable floating point number, then the 
product imni^i^nixi) ■ maxi=i,„(xi) is also representable. The algorithm uses 
that theorem to perform a correct computation. 

Moreover, Stewart's algorithm uses the fact that (6 -I- ia)/{d + ic) — e ~ if. 
In Stewart's algorithm, the case \d\ < \c\ is managed explicitely. Instead, in the 
case \d\ > |c|, Stewart flrst switches a and 6, then c and d, and uses the same 
algorithm as for the first branch. In the case where a switch was performed, the 
statement f =-f is used to get the correct result. Since the switch operation and 
the change of the sign in f do not produce a rounding error, we can use them 
safely. 
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First, we notice that Stewart's algorithm performs weh in the complex divi- 
sion [5j This can be expected from the algorithm, which is able to prevent most 
overflows caused by multiplications. Stewart's algorithm is much better than 
Smith's algorithm. Unfortunately, as we are going to see later in this paper, 
Stewart's algorithm fails in many cases. 

In the ISO/IEC 9899 :TC3 C99 standard 0, section G.5.1 "Multiplicative 
operators", the authors present a _Cdivd function in the C language which 
implements the complex division. According to Kahan |10| (in the Appendix 
"Over/Underflow Undermines Complex Number Division in Java"), this code 
is due to Jim Thomas and Fred Tydeman. The algorithm is based on a pre- 
scaling of c and d, such that their base 2 exponent is made zero. This scales the 
denominator + (P and avoids most overflows or underflows. The post-scaling 
produces then the correct result. This scaling is based on a power of 2, which 
avoids rounding. Only in the case of an IEEE exception, the algorithm recom- 
putes the division, taking into account for Nans and Infinities. The code does 
not defend against overflow and underflow in the calculation of the numerator. 
We will show later in this paper that the C99 algorithm performs nearly as well 
as Smith's algorithm, but fails in many cases. 

In [12] . Li et al. present a complex division algorithm with scaling. The 
algorithm is made of two stages, where the external stage scales the numerator 
and denominator if they are two small or too large, and the internal stage 
is Smith's algorithm. An error bound is presented for this algorithm, which 
is presented in the appendix B, "Smith's Complex Division Algorithm with 
Scaling" of the technical report [11] (but not in the paper [12]). The scalings are 
designed to avoid overflows and underflows by scaling up or down the arguments. 
Since the scaling is based on 16 = 2^, the associated multiplications do not 
produce rounding. Unfortunately, Li et al.'s algorithm is unable to fix the 
failure that we have analyzed for Smith's method. Moreover, the scaling used 
in Li et al.'s algorithm fails in many cases, as we are going to see later in this 
paper. 

Priest published in 2004 a complex division algorithm based on scaling [TC] . 
This scaling is designed to avoid overflow and harmful underflow. The scaling 
requires only four floating point multiplications and a small amount of integer 
arithmetic to compute the scale factor. The scaling is based only on the values of 
c and d. Priest shows that choosing a scaling factor close to |c -I- id\~^^^ works 
well in most situations. As we are going to see later in this paper. Priest's 
algorithm does not perform well in many cases. 

3 An improved algorithm 

In this section, we present an improved complex division algorithm and prove 
that the algorithm is, in some sense, minimum. We first present the algorithm in 
the Scilab language, then prove that this algorithm uses the minimum number 
of branches required to evaluate a specific intermediate expression accurately. 
Then we present numerical experiments which indicate that this algorithm in- 
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Fig. 5: Improved algorithm for the complex division. 

function z = c oinpdiv_ improved ( x, y ) 
a = real(x); b = imag(x) 

c = real(y); d = imag(y) 

if ( abs(d) <= abs(c) ) then 

[e,f] = improved_internal (a , b , c , d) 

else 

[e,f] = impr o ved_ int ernal (b , a , d , c ) 
f = -f 

end 

z = complex (e , f ) 
endf unct i on 

function [e,f] = impr oved_ int ernal ( a , b , c , d) 
r = d/c 

t = l/(c + d * r) 
if (r <> 0) then 

e=(a+b*r)*t 

f = (b-a*r)*t 

else 

e = (a + d * (b/c)) * t 
f = (b - d * (a/c)) * t 

end 
endf unction 



deed performs significantly better than other known algorithms in practical 
situations. 

3.1 Analysis of the algorithm 

The Scilab function compdiv_improved presented in the figure [5] is an improved 
complex division algorithm. 

Assume that \d\ < \c\. In this case, Smith's method was to compute 

e ^ {a + b * r)/den, f = {b — a * r)/den, (6) 

where den = c + d * r. 

We notice that we divide two times by den. Instead, we may use the formula 

e = {a + b * r) * t, f = {b — a * r) * t, (7) 

where t — l/(c + (i*r). This replaces two divisions by one division and two 
multiplications, which may be faster. On the other hand, let us consider a 
system with gradual underflow. If \den\ is below the underflow threshold, that 
is if |c + d * r| G [a, fi[, then the two calculations are not the same: t overfiows, 
while the expression {a+b*r)/den may produce a correct result. As we are going 
to see in the randomized numerical experiments of the section this does not 
have a significant effect on the overall robustness of the algorithm. This comes 
from the fact that this type of error occurs only with extreme inputs, near the 
underflow threshold. Therefore, we favor the speed, in this case. 
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We have seen that Smith's formula may fail when the expression r = d/c 
underflows. Then, the product a * r is evaluated as zero, which may produce a 
very inaccurate value oi f = {b — a * r) * t. One possible approach is to consider 
the three different ways of evaluating the expression a* r = a* {d/c), namely 
a * r = a * (d/c), d * (a/c) and [d * a)/c. Using the appropriate expression is 
essentially the idea of Stewart. 

All in all, this leads to three possible ways of evaluating e and /: from the 
original equation 



or from 



or from 



e = {a + h*{d/c))*t, f ^ {b - a * (d/c)) * t, (8) 
e = {a + d*{b/c))*t, f ^ {b - d * (a/c)) * t, (9) 
e = (a + (d*fe)/c) / = (6- (dH<a)/c)) *L (10) 



We are going to prove that, if both the equations [5] and M fail to evaluate 
e and /, then the equations [TUl cannot succeed. Hence, we are going to prove 
that there is no point in trying to evaluate the expressions in[TUl as Stewart's 
algorithm does. 

More precisely, the following proposition proves that there is no point in 
evaluating e by the left equation infTOl 

Proposition 3.1. (^Improved Complex Division^ Assume that a > is a real 
positive number, fl > a is a real positive number, u is the unit roundoff and 
e = 2u is a small positive number such that < 4e < 1 — u representing the 
machine epsilon. Assume that 

a = . (11) 

{i-u)n ^ ' 

Assume that b,c,d are four real numbers in the range [a, f2]. We assume that 

d<c. (12) 



If 



d , , 

- < a, (13) 

- < a, (14) 

c 



therefore 



bd 

— <a. 
c 



(15) 
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We emphasize that the equation[TT]is satisfied by the binary 64 floating point 
system. Notice that a simplified but still interesting result can be obtained if we 
ignore gradual underflow, and consider only numbers in the range [jj j Since 
the proof is not much more complicated with 1111 we consider it directly. 

Let us analyze the assumptions of the previous proposition. First, the as- 
sumption [m states that we are in the case where algorithm should use the 
expressions [3 [9] or [lOl Second, the condition [13] states that the expression [8] 
may have failed to produce an accurate e, because r = d/c underflows. Sec- 
ondly, the condition [T4l states that the expression [9] may have failed to produce 
an accurate e, because b/c underflows. Finally, the condition [TSl states that the 
product bd/c underflows, so that evaluating e with the equation 1101 is useless. 

Proof. The proof is by contradiction. Let us assume that 

bd ,^ 
a<— , (16) 

c 

i.e. let's assume that bdjc is a floating point number. By the inequality [T31 we 
have 1 < By the inequality [111 we have 

(IT, 

Therefore, \ <b. We multiply both sides by and get 

-^r~ ^ ^^^^ 

Now, the condition [TD implies — < c. From the hypothesis [TTJ we substitute for 
a in the previous equation, and get 

b- — < c. (19) 

4e 



We plug the ineaualitv [T51 into [TOl and get 

(1 -u)rj 



4e 



< c. (20) 



By hypothesis, we have < 4e < 1 — u. Therefore, 1 < -^t^i which implies 
ri < '•"'"^"^^ . We plug the previous inequality into [20l and get 57 < c. The 
previous inequality is not possible, since, by hypothesis, c e [a, 57]. Hence, the 
inequality [inj is false, which concludes the proof. □ 



The proposition 13.11 provides the most difficult part of the proof. But it only 
considers the case where 6, c and d are positive, does not consider the compu- 
tation of / and does not consider the case d > c. In fact, it is straightforward 
to derive these results. 

Let us prove that the proposition 13.11 can be extended to three numbers 5, c 
and d in the range [—51, —a] U [a, 51]. Let us consider the absolute values |6|, |c 
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Fig. 6: A collection of difficult complex divisions. 
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and These three numbers are in the range [a, where the proposition 13. II 
can be applied. Let us assume that \d\ < |c|, that \d/c\ < a and that \b/c\ < a. 
Then, the proposition 13 . f I implies that |6(i/c| < a, i.e. the expression {b*d)/c 
underflows. 

Let us prove that there is no point in evaluating / by the right equation in 
[TOl We notice that the evaluation of / in the right part of [TOl involve the product 
{a* d)/ c. It is easy to see that we can apply the proposition 13.11 by replacing b 
by a, which leads to the result. 

Finally, the case d > c is similar, since it suffices to switch first a and 6, then 
c and d and to take the opposite of /. 

The following session shows that the improved algorithm works well in the 
same case (from the section where the original Smith's algorithm failed. 

-->x = 2-1023 -I- '/.i * 2--1023; 

-->y = 2-677 -^ '/.i * 2--677; 

-->z = 2-346 - '/.i * 2--1008; 

-->q = c oinpdiv_ improved (x , y ) 

q = 

1.43D-H04 - 3.64D-304i 

The previous numerical experiment indicates that the improved algorithm 
perform well in the case for which it is designed. We now have to explore its 
behavior in other situations, which is done in the next section. 

3.2 Several difficult complex divisions 

In this section, we present a collection of difficult complex divisions, and analyze 
the behavior of the improved algorithm on these particular cases. 

The figure |6] presents some difficult complex divisions. The real and imagi- 
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nary parts of the exact result for the cases 7, 9 and 10 are 

Z7 = 3.898125604559113300 x 10^^^ 

+i ■ 8.174961907852353577 x lO^^^, 

zq = 0.001953125-^-0.001953125 
zio = 1.02951151789360578 x 10"*** 

+i ■ 6.97145987515076231 x 10"^^°. 

The accuracy acc of a computed real number compared with an expected 
real number is computed with the formula acc — floor(— log2(7'e)) where log2 is 
the base-2 logarithm and the relative error is computed with re — \computed — 
expected\ /\expected\. With 53 bits of precision with doubles, acc is an integer 
in the range [0,53], where acc = corresponds to a completely wrong result 
and ac = 53 is the maximum possible accuracy. The accuracy of a complex 
computed result is the minimum of the accuracy of the real and imaginary 
parts, i.e. the worst accuracy is displayed. 

The figure [7] presents the results of several complex division algorithms on 
the difficult complex divisions that we have presented. 

We now analyze these particular complex divisions in more detail. 

We have already analyzed the cases 1 and 2 in the section [^751 and the case 
3 in the section [^31 

The case 4 is a typical failure case for Smith's algorithm, caused by overflow 
in a sum in the expression fl{e) = fl{{a + h*r)/den) = f I {2^^'^^ + 2^^'^^ * I) 12 = 
Inf /2 = Inf. The same issue makes Stewart's and the Improved algorithm fail. 
The scaled algorithms, such as Li et al.'s and Priest's, work in this case. In Li et 
al.'s algorithm, the downscaling of a and b by the expressions a=a/16 and b=b/16 
reduces the magnitude of these numbers. Then the expression e=(b+a*r)*t 
works fine, so that Li et al.'s algorithm produces the correct result. 

The case 5 occurs when / is close to the underflow threshold. In Smith's al- 
gorithm, the computation is: fl{r) = flid/c) = fl{2~'^^^/2^^^) = 0. The imagi- 
nary part is then evaluated with fl{e) = /l((8.52 x lQ-'^^^)l{2m x lO^^^)) = 0. 
In Li et al.'s algorithm, the inputs a and h are downscaled, but then Smith's 
algorithm fails with the same result as previously. In the Improved algorithm 
(and in Stewart's), the underflow of r is avoided by the use of the expression 
f=(b-d*(a/c))*t, which succeeds. 

The case 6 occurs when e is close to the underflow threshold. We notice 
that Smith's and the Improved algorithms both succeed. On the other hand, we 
notice that Li et al.'s algorithm fails and produces a zero real part. Since Smith's 
algorithm succeeds, this means that the extra-scaling in Li et al.'s algorithm 
actually results into a lower accuracy. Indeed, Li et al.'s algorithm down scales 
a and h with the expressions a=a/16 and b=b/16. When computing the real part 
e, the algorithm uses fl{e) = /Z((2.647 x lO'^^) * (4.66 x lO'^o^)) = o. When 
Li et al.'s algorithm finally upscales the real part with e=e*16, the underflow of 
e has already occurred and this cannot rescue the correct result anymore. 

The cases 7 and 10 will be reviewed in the section 
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Fig. 7: Result of various complex division algorithms on particular inputs. 
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It is interesting to sec that, in Li et al.'s algorithm, the downscaling makes 
the case 4 work, but makes the case 6 fail. 

3.3 Random samplings 

It is in fact difficult to analyze the algorithms with a case-by-case method. 

The reason is that adding a particular treatment for a potential failure might 
generate a failure in another case. The downscaling of a and b in Li et al's 
algorithm is an example of this kind of issue. 

The Linux 32 bits machine we have for these tests is making use of the x87 
registers. Hence, Scilab's internal complex division operator "/", which makes 
use of Smith's 1962 algorithm, always produces the correct result in practice. On 
the other hand, a Scilab script which implements a complex division based on 
elementary real arithmetic produces consistently the same results, because the 
values are regularly stored into doubles. In this situation, the use of x87 registers 
and their extended precision has no effect on these algorithms. On this Linux 
machine, we can then consider that the complex division (a+yoi*b)/ (c+y«i*d) 
is the "reference", while the Scilab-based algorithms are the algorithms under 
test. By the way, notice that the behavior of the scripts simulates the behavior 
of a similar C (or Fortran) source code which would be compiled in such a way 
that it uses the SSE registers instead of the extended precision. 

With the goal of automatically finding counter-examples, we experimented 
the following brute-force search for failures. It is not possible to try all possible 
combinations of a, b, c and d, because this would lead to (2^^)* = 2^^^ w lO''^ 
combinations, which is much too large. When the space of the input variables 
is too large, it is common to perform Monte-Carlo simulations. The idea of our 
experiment is to randomly choose a sign, to arbitrarily pick the unit significand 
1, and to randomly choose an exponent in the range allowed by the floating 
point binary 64 IEEE format. 

We consider divisions with a = So2"<', b = S62"S c = S62"<= and d = 5^2"**, 
where Sa,Sb,Sc and Sd are uniform random integers in the set {— 1,-|-1}, and 
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Fig. 8: Result of various complex division algorithms with inputs (a, &, c, d) hav- 
ing uniformly random exponents in { — 1074, —1073, . . . , 1023}. A lower 
failure probability is better. 
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Ua, ni,, ric, rid are uniform random integers in the set { — 1074, —1073, . . . , 1023}. 
These integers correspond to the range of exponents available with doubles, 
where subnormal numbers (in the range {—1074, —1073, —1023}) are in- 
cluded. 

We perform N complex divisions, where N a large integer, e.g. N — 1000000. 
We count the number of times T a particular algorithm, say Smith's 1962 for 
example, produces a different result from Scilab's division using the extended 
precision of x87 registers. Under the assumption that Scilab/x87 result is al- 
ways correct, then T is a count of the failures of Smith's 1962 algorithm. The 
probability of a failure p is then estimated hy p = T/N. 

When the probability is not too far away from p = 0.5, this method is 
relatively accurate at estimating a probability of failure. When the probability 
p is very close to 0, then the events are rare so that the Monte-Carlo simulation 
needs more simulations in order to be accurate. We can estimate the variance of 
the result with V{T/N) = p{l—p) /N . The accuracy of the estimated probability 
p can then be evaluated by computing the confidence interval 

(^/N - lM\lv{T/N), T/N + lM\lv{T/N)^ , (21) 

which contains the true probability 95% of the times. 

The figure |8] presents the results of various algorithms in this numerical 
experiment. The experiments required from TV = 10000 to TV = 100000 simu- 
lations, depending on the failure probability (smaller probabilities require more 
simulations for a given accuracy). 

We notice that the naive algorithm fails quite often, with a failure probability 
close to 50% in this experiment! The Smith algorithm reduces the failure proba- 
bility down to approximately 1%. Li et al.'s algorithm only marginally improves 
the robustness. This can be expected from the algorithm, which scales up or 
down only for extreme exponents. The improved algorithm improves radically 
on Smith, reducing the failure probability down to approximately l.e-4. 
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Stewart's algorithm and our improved division algorithm are associated with 
the same failure probability, as expected from the theory. The difference in the 
probabilities displayed in the table [S] is only an effect of the Monte-Carlo experi- 
ment, where the probability of failure is itself a random variable: the confidence 
intervals do not show any significant differences between these algorithms. 

3.4 The need for a robust algorithm 

In this section, we present failure cases for the improved algorithm. We present 
the causes of the failures, and analyze ways to get the correct result. The 
complex divisions we consider were presented in section [3.21 

We first consider the case 4, where the improved algorithm produces an 
infinite number, instead of the exact finite result. Notice that the exact result 
z = 2^°^-^ « 8.98 X 10^°'' is very close to fi. The ratio r = d/c achieves its 
maximum value, that is, r = 1. Then we compute t = \/2 and the expression 
for e overflows, while / is correctly evaluated as zero. The cause of this failure 
is the evaluation of the expression a+b*r, which overflows. We can easily check 
that using the expression a*t+b*r*t instead of (a+b*r)*t fixes the problem. 
This is because t is small, in this case, so that the sum of the two expressions 
a*t and b*r*t produces the correct result. 

Another solution is to downscale the numerator x, as in Li et al.'s algorithm. 

It is straightforward to check that using the statement x=x/2, then calling 
the improved division z=compdiv_improved(x,y) , and finally back-scaling with 
z=z*2 fixes the problem. In Li et al.'s algorithm, the downscaling is done by a 
factor 16 instead of 2, even if downscaling with 2 also works. 

We then consider the case 7, where the improved algorithm produces infinite 
real and imaginary parts. The analysis of the steps of the algorithm shows that 
t overflows, which causes both e and / to overflow. 

The fact that t overflows is explained by the fact that r is small in this 
case, such that c + dxrwc^ 2^^""^^. But t w 1/c = 2^'^'^'', which is larger 
than ri: t overflows. In this case, upscaling the denominator y, as done in Li 
et al.'s algorithm, solves the problem. More precisely, we compute the scaling 
factor S=2/7oeps~ 2, upscale the denominator with y=y*S, perform an improved 
complex division, and finally backscale the result with z=z*S. 

-->S = 2/*/.eps"2; 
-->y = y * S 

y = 

2.75D-281 + 1.31D-287i 
-->z = c oinpdiv_ improved (x , y ) 
z = 

9.61D+257 + 2.01D+264i 
-->z = z * S 
z = 

3.89D+289 + 8.17D+295i 

In the case 10, the improved algorithm produces a correct real part, but the 
imaginry part is evaluated as 7.08D-220 instead of the exact result 6.97D-220. 
The cause is in the evaluation of /, where the formula fails to be accurate. The 
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Fig. 9: Robust external algorithm for the complex division. 

function z = coinpdiv_robust ( x, y ) 
a = real(x); b = imag(x); 
c = real(y); d = imag(y); 
AB = max (abs ( [a b] ) ) 
CD = max (abs ( [c d] ) ) 
B = 2 
S = 1 

OV = number _propert ie s (" huge " ) 
UN = number _propert 1 e s (" t iny " ) 
Be = B/%eps"2 

if ( AB >= OV/2 ) then // Scale down a, b 
X = x/2; S = S*2; 

end 

if ( CD >= OV/2 ) then // Scale down c, d 

y = y/2; S = S/2; 
if ( AB <= UN*B/'/.eps ) then // Scale up a , b 

X = x*Be; S = S/Be; 

end 

if ( CD <= UN*B/'/.eps ) then // Scale up c , d 
y = y*Be; S = S*Be; 

end 

z = r obus t _ int ernal (x , y ) 
z = z * S 
endf unction 



first step which fails is the computation of the product a*r=0, which underflows. 
Hence the expression (a*r)*t evaluates as zero. We notice that, if we evaluate 
the product as (a*t)*r, we get (a*t) *r=l . lOD-221, which is accurate. Since 
r < 1, the case where the product a*r underflows can be expected to be not so 
rare. 

3.5 A robust complex division 

In this section, we present a robust complex division algorithm. This algorithm 
almost never fails completely, and produces a result which is rarely more than 
1 bit inaccurate. 

Our robust complex division algorithm is based on two steps: 

• in the first step, we downscale or upscale the numerator x and the denom- 
inator y, only if necessary, 

• in the second step, we perform a complex division, based on a robust 
algorithm. 

The first step is similar, but not identical, to the scaling in Li et al. algorithm. 
We scale only with powers of 2, so that this scaling does not produce any 
roundoff error. 

The compdiv_robust function presented in the figure [9] implements the first 
step. 
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The downscaling of x (or y) is done only when a or 6 (or c or d) is greater 
than OV/2, where OV is the overflow threshold J7. This is different from Li et 
al.'s algorithm, where the downscaling is done with the factor 16, instead of 
2. Indeed, we found no theoretical and no practical reasons to use the scaling 
factor 16. On the theoretical side, the factor 2 guarantees that expressions such 
as a + 6 * r, for example, do not overflow. This is because if |a|, |5| < 0/2, then 
\a + hr\ < since \r\ < 1. On the practical side, the factor 16 makes the case 
6 fail for Li et al.'s algorithm, while the factor 2 works well. Moreover, the 
factor 2 is sufficient to pass the examples created by Mc Laren [T3], or similar 
examples. 

The upscaling of x (or y) is similar to Li et al.'s algorithm. This step 
applies when a and 6 (or c and d) are smaller than 2*UN/yoeps, where UN is 
the underflow threshold /j, and "/.eps is the machine epsilon e (2*UN/yoeps is 
approximately equal to 2 x 10~^^^). This step is useful on floating point systems 
where the policy is store-zero instead of gradual underflow. Indeed, Demmel 
analyzed [HIS] the intermediate expressions involved in Smith's algorithm. In 
some cases where the inputs are close to the underflow threshold /it, the relative 
error produced by Smith's algorithm is large on a system with store-zero, while 
gradual underflow may produce the correct result. Moreover, even on a system 
with gradual underflow, the upscaling step may allow one to recover the exact 
result which would have otherwise been lost. 

The robust_internal function presented in the flgurc [TU] implements the 
complex division, based on a robust complex division algorithm. The func- 
tion internal_compreal computes the real part of the complex division (a + 
ib)/{c + id), assuming that |c| < \d\. This is essentially the same algorithm 
as in compdiv_improved, where the underflows of the expression b*r is man- 
aged specifically. The robust_subinternal function computes both the real 
and imaginary parts of (a -|- ib)/{c + id), assuming that |c| < \d\. We use the 
fact that the imaginary part of (a -I- ib)/{c + id) is equal to the real part of 
(6 — ia)/{c -\- id). 

3.6 Numerical experiments 

In this section, we analyze the numerical results of the robust complex division 
algorithm. 

We consider the difficult complex divisions that we have presented in the 
section lS^ Our numerical experiments show that the compdiv_robust function 
produces the maximum possible accuracy (i.e. 53 binary digits) for all cases 1 
to 10, with the exception of the case 8 which is accurate to 52 bits only. Hence, 
the robust complex division compdivjrobust almost never fails to produce the 
exact result in these cases. 

We performed randomized experiments, with the same method which has 
been presented in the section [331 We considered several sets of 100 000 exper- 
iments, which all lead to the same following result. 

The probability that the robust complex division compute less that 52 sig- 
nificant digits is lower than 10~®. Some complex divisions did not produce 
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Fig. 10: Robust internal algorithm for the complex division. 

function z = robust_internal (x , y) 
a = real(x); b = iinag(x) 

c = real(y); d = imag(y) 

if ( abs(d) <= abs(c) ) then 

[e,f] = robust.subinternal (a , b , c , d) 

else 

[e,f] = robust.subinternal (b , a , d , c) 
f = -f 

end 

z = complex ( e , f ) 
endf unction 

function [e,f] = robust_subinternal (a , b , c , d) 

r = d/c 

t = l/(c + d * r) 

e = internal_compreal (a ,b , c , d , r , t ) 

a = -a 

f = internal_compreal (b , a , c , d , r , t ) 

endf unction 

function e = int ernal _ compr eal (a , b , c , d , r , t ) 
if (r <> 0) then 
br = b*r 

if ( br <> ) then 
e = (a + br ) * t 

else 

e=a*t+(b*t)*r 

end 

else 

e = (a + d * (b/c) ) * t 

end 
endf unction 
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the exact result, which event is associated to the 95% probabihty interval 
[2.46 X 10~^, 1.35 X 10"'']. We emphasize that, for these difficult complex divi- 
sions, only the last bit is wrong for the real or imaginary parts: all the other 
bits in both the real and imaginary part are correct up to the 52th bit. 

We have found extremely rare cases, with probability lower than 10~^, which 
are associated with a failure of the robust algorithm. In the following example, 
the real part of the exact division is equal to the underflow threshold a, but the 
result of the algorithm has a zero real part. 

-->x = 2--912+/',i*2--1029; 
-->y = 2--122 + y.i*2-46; 
-->z = 2--1074+y.i*-2--958; 
— >coinpdiv_robust(x,y) 
ans = 
- 4.10D-289i 

Notice that the exact real part is 2.470328229206233817... x IQ-^^'^, for which 
the closest double is a, although the number is itself stricly lower than a. In this 
case, the scaling by a power of 2 is not used, since the arguments are well within 
the bounds. In the robust_internal function, the algorithm detects that the 
expression b*r underflows. Therefore, it uses the expression e=a*t+(b*t)*r 
where both the terms (a*t)*r and b*t underflow. In fact, both terms are 
lower than a/ 2: only their sum is slightly greater than a/2. 

3.7 Performance 

In this section, we analyze the performances of the algorithms that we have 
presented previously. Since they are in the Scilab language, performance can 
be obtained, provided that we vectorize the algorithm. On the other hand, 
Scilab's complex division operator is based on a compiled Fortran source code. 
Therefore, in order to make the comparisons clearer, we created compiled source 
codes in the ANSI C language, based on the Scilab prototypes. We did this for 
Priest's algorithm, the Improved complex division algorithm and for the Robust 
complex division algorithm. 

In order to compile these source codes, we have to pay attention to the 
compilation options that we use. Indeed, our goal is to see the behaviour of our 
algorithm when we use binary 64 floating point numbers only: we avoid making 
use of the extended precision. The computer for this experiment is an Intel 
Pentium M at 2 GHz. We used a Linux 32 bits operating system, Ubuntu 10.04 
LTS, where we used the GCC compiler and the -mfpmath—sse -msse2 options. 

The algorithms are tested with a dataset with size N = 1574802 and random 
numbers a, 6, c and d uniform in [0,1]. This dataset size has been chosen so 
that performing the N experiments for one algorithm requires more than 0.3 
seconds. The performance is measured 10 times, after which we compute the 
average time. We compute the MCDPS performance measure, which is the 
number of Millions (i.e. 10^) of Complex Division Per Seconds, based on the 
average time. The figure[Tl]presents the performance of various complex division 
algorithms. 
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Fig. 11: Performances of various complex algorithms. 
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The compiled Improved function is a little faster than Scilab. The perfor- 
mance difference is about 5%. Wc do not have an explanation for this difference 
and consider that this is negligible in practice. The compiled Robust is slower 
than Scilab. This can be explained because the algorithm is much more com- 
plex. The performance difference is about 33%. The compiled Priest algorithm 
is a little slower than Scilab. The performance difference is about 5%. 

4 Conclusion 

In this paper, we have presented two algorithms for floating point complex 
division. We have designed an improved algorithm which takes into account 
for potential overflows or underflows in intermediate expressions involved in the 
complex division. This algorithm does not explicitly manage all the possible 
ways of performing a product of three terms, as was done in Stewart's algorithm. 
Hence, our algorithm is signiflcantly simpler than Stewart's, but provides the 
same numerical accuracy. 

Numerical experiments indicate that the improved algorithm performs much 
better than a naive implementation and produce fewer errors than Smith's 1962 
algorithm. Its robustness is the same as Stewart's algorithms, and is much 
better than Li ct al.'s. Priest's or the C99 algorithms. This can be expected 
in particular from Li et al.'s algorithm, which improves the accuracy only for 
extreme inputs. 

Based on this preliminary algorithm, we have designed a robust algorithm, 
which, most of the times, produce more than 52 signiflcant digits in both the real 
and imaginary parts. While this complex division algorithm is extremely robust, 
it can fail in cases where the output is very close to the underflow threshold a. 
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